Introduction

In this report we are going to analyze the data set from the center of policing Equity (CPE) is a consortium of research scientists that promotes the police transparency and accountability with the help of innovation and collaboration between law enforcement agencies and the communities they serve. This is the collection of standardized police behavioral data. In this report we will try to find out the problems in the systems, such as racism in the police department. And try to find some answers and extract some insights after doing some visualization. The ultimate goal is to inform police agencies where they can make improvements by identifying deployment areas where racial disparities exist and are not explainable by crime rates and poverty levels.

We will remove the values that contain the Na’s values more than 60% in each column. And for other column which contain the Na’s values less than 10% we will going to them with the median values.As we can see the variables that are available to use for the visualization are included in the table below .After cleaning the data set we have a total of 2383 rows and 38 column left in the data set. With 13 columns as the continuous variable and 25 are the categorical variable.

vars n mean sd median trimmed mad min max range skew kurtosis se
INCIDENT_DATE 1 2383 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
INCIDENT_TIME* 2 2383 270.839698 1.595198e+02 275.00000 270.743052 209.0466000 1.00000 543.00000 5.42000e+02 0.0094140 -1.2458302 3.2677785
UOF_NUMBER* 3 2383 1163.611834 6.714928e+02 1166.00000 1163.887258 859.9080000 1.00000 2328.00000 2.32700e+03 -0.0039594 -1.2017925 13.7555931
OFFICER_ID 4 2383 9572.125472 1.534883e+03 10115.00000 9823.988988 1083.7806000 0.00000 11170.00000 1.11700e+04 -1.6462990 3.8716754 31.4422288
OFFICER_GENDER* 5 2383 1.899287 3.010120e-01 2.00000 1.998951 0.0000000 1.00000 2.00000 1.00000e+00 -2.6518482 5.0344118 0.0061663
OFFICER_RACE* 6 2383 5.045741 1.285130e+00 6.00000 5.219192 0.0000000 1.00000 6.00000 5.00000e+00 -0.8456296 -0.7470492 0.0263260
OFFICER_HIRE_DATE* 7 2383 150.966009 8.608945e+01 169.00000 152.265338 100.8168000 1.00000 291.00000 2.90000e+02 -0.1429733 -1.2386832 1.7635506
OFFICER_YEARS_ON_FORCE 8 2383 8.049098 7.562481e+00 6.00000 6.646565 5.9304000 0.00000 36.00000 3.60000e+01 1.4833403 1.4677128 0.1549181
OFFICER_INJURY* 9 2383 1.098196 2.976413e-01 1.00000 1.000000 0.0000000 1.00000 2.00000 1.00000e+00 2.6987908 5.2856902 0.0060972
OFFICER_INJURY_TYPE* 10 2383 49.748216 1.048442e+01 52.00000 52.000000 0.0000000 1.00000 76.00000 7.50000e+01 -3.7374257 14.3606302 0.2147744
OFFICER_HOSPITALIZATION* 11 2383 1.020143 1.405177e-01 1.00000 1.000000 0.0000000 1.00000 2.00000 1.00000e+00 6.8269807 44.6263931 0.0028785
SUBJECT_ID 12 2383 40255.407889 1.240602e+04 44573.00000 43652.611956 2043.0228000 0.00000 47972.00000 4.79720e+04 -2.4671205 4.8364561 254.1384834
SUBJECT_RACE* 13 2383 4.052455 1.542627e+00 3.00000 3.819612 0.0000000 1.00000 7.00000 6.00000e+00 1.2030960 -0.2228576 0.0316009
SUBJECT_GENDER* 14 2383 1.820394 3.979000e-01 2.00000 1.894075 0.0000000 1.00000 4.00000 3.00000e+00 -1.3654857 1.0723672 0.0081510
SUBJECT_INJURY* 15 2383 1.263953 4.408666e-01 1.00000 1.205034 0.0000000 1.00000 2.00000 1.00000e+00 1.0703824 -0.8546395 0.0090312
SUBJECT_INJURY_TYPE* 16 2383 108.264373 4.129537e+01 122.00000 115.830100 0.0000000 1.00000 193.00000 1.92000e+02 -1.6392554 2.0976724 0.8459397
SUBJECT_WAS_ARRESTED* 17 2383 1.859421 3.476598e-01 2.00000 1.949135 0.0000000 1.00000 2.00000 1.00000e+00 -2.0667910 2.2725791 0.0071218
SUBJECT_DESCRIPTION* 18 2383 9.242551 5.211150e+00 11.00000 9.552701 4.4478000 1.00000 15.00000 1.40000e+01 -0.6223513 -1.2286417 0.1067509
SUBJECT_OFFENSE* 19 2383 250.870331 1.769465e+02 304.00000 248.448873 220.9074000 1.00000 551.00000 5.50000e+02 -0.0999534 -1.3374383 3.6247660
REPORTING_AREA 20 2383 3190.562736 1.936015e+03 2231.00000 2944.816466 1700.5422000 1001.00000 9611.00000 8.61000e+03 1.1669871 1.4928651 39.6594516
BEAT 21 2383 392.772556 2.104614e+02 351.00000 383.353959 292.0722000 111.00000 757.00000 6.46000e+02 0.2649576 -1.2814252 4.3113212
SECTOR 22 2383 389.022241 2.105877e+02 350.00000 379.480860 296.5200000 110.00000 750.00000 6.40000e+02 0.2677882 -1.2869836 4.3139098
DIVISION* 23 2383 3.688208 2.137520e+00 3.00000 3.610383 2.9652000 1.00000 7.00000 6.00000e+00 0.1453711 -1.4108341 0.0437873
LOCATION_DISTRICT* 24 2383 7.785145 3.652288e+00 7.00000 7.841636 4.4478000 1.00000 14.00000 1.30000e+01 -0.0682703 -1.0446330 0.0748175
STREET_NUMBER 25 2383 4903.800671 4.532293e+03 3415.00000 4297.367593 3536.0010000 0.00000 54023.00000 5.40230e+04 2.3118702 12.7090834 92.8444450
STREET_NAME* 26 2383 530.386488 3.026965e+02 524.00000 527.276350 381.0282000 1.00000 1080.00000 1.07900e+03 0.0787616 -1.1560959 6.2007659
STREET_DIRECTION* 27 2383 2.986572 7.621312e-01 3.00000 2.981647 0.0000000 1.00000 5.00000 4.00000e+00 0.0395708 2.3493465 0.0156123
STREET_TYPE* 28 2383 12.185481 6.737242e+00 13.00000 12.456738 8.8956000 1.00000 22.00000 2.10000e+01 -0.2966993 -1.4051343 0.1380131
LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION* 29 2383 642.278640 3.961413e+02 631.00000 638.865758 520.3926000 1.00000 1322.00000 1.32100e+03 0.0558407 -1.2824814 8.1149925
LOCATION_CITY* 30 2383 1.000000 0.000000e+00 1.00000 1.000000 0.0000000 1.00000 1.00000 0.00000e+00 NaN NaN 0.0000000
LOCATION_STATE* 31 2383 1.000000 0.000000e+00 1.00000 1.000000 0.0000000 1.00000 1.00000 0.00000e+00 NaN NaN 0.0000000
LOCATION_LATITUDE 32 2383 32.801958 8.529450e-02 32.78406 32.796274 0.0764651 32.63318 33.01519 3.82007e-01 0.6003629 -0.1638629 0.0017473
LOCATION_LONGITUDE 33 2383 -96.783915 6.431190e-02 -96.79111 -96.785531 0.0489050 -96.95503 -96.57442 3.80608e-01 0.2897695 -0.0177765 0.0013174
INCIDENT_REASON* 34 2383 5.850608 4.359516e+00 3.00000 5.544835 1.4826000 1.00000 14.00000 1.30000e+01 0.4032136 -1.6900293 0.0893051
REASON_FOR_FORCE* 35 2383 4.963911 3.340519e+00 3.00000 4.624541 2.9652000 1.00000 12.00000 1.10000e+01 0.7544860 -0.6290041 0.0684308
TYPE_OF_FORCE_USED1* 36 2383 21.129249 8.794610e+00 27.00000 22.242790 2.9652000 1.00000 29.00000 2.80000e+01 -0.7590019 -1.0215376 0.1801584
NUMBER_EC_CYCLES* 37 2383 11.419220 2.245828e+00 12.00000 12.000000 0.0000000 1.00000 12.00000 1.10000e+01 -3.7111750 12.0582682 0.0460060
FORCE_EFFECTIVE* 38 2383 59.353756 2.324481e+01 69.00000 61.410593 20.7564000 1.00000 104.00000 1.03000e+02 -0.6866249 -0.2577183 0.4761721

visualizations-

In this graph we are looking at the different race of police officers who stayed at different locations and how many years they have spent on that location. It gives us the information regarding the behaviors of police officers according to their location, and also tells us about if the particular race of officers tens to stay longer than the other race of officer on the particular location. We can see that the most spread out population of officers is of “American Ind” this can be due to the fact they are limited in number on the force , and we can see the central part of the area is dominate by the white police officers.

A boxplot is a standardized way of displaying the distribution of the data set,it tells about the distribution of officers of different race and how many time they have spent on the force.It tells us about the information regarding the outliers and spread. As we can see that Police officers with the racial background as white , tends of spend much longer time on the force then the other police officers, and at the same time we see many outliers who spent more time on the force then the normal officers.Though this pattern also matches with the officers with the racial backgroud as “Hispanic”.Most of the point in both the Race lies out the area of the normal distribution. Whereas it is not the case with the other officers.

In this graph we will going to look at the which sector has what kinds of the incident reason recorded. We will going to look at the tile chart which can provide the clear representation of the distribution of those calls along the different sectors. As we can see the “Traffic Stop” is the most common Incident that has occurred among the all sector , we can see it color distribution ranging from light to dark. The least number of calls is for the “Crowd Control” and “Accidental Discharge”. Whereas “Arrest” and “call for Cover” are the most called incident reasons.

## Warning in matrix(g$fill_plotlyDomain, nrow = length(y), ncol = length(x), :
## data length [2603] is not a sub-multiple or multiple of the number of rows [14]
## Warning in matrix(g$hovertext, nrow = length(y), ncol = length(x), byrow =
## TRUE): data length [2603] is not a sub-multiple or multiple of the number of
## rows [14]

Facet grid helps us to see multiple graphs at a one time and we can compare easily between different categories.In this graph we have tried to find out regarding the injury pattern of different officers over the years of their duty according to their gender and race.We have found out that their is positive relation between the between the age for the female white officers and how much time they spent on force,and it is same for the black female and male officers.In case for the asian female officers we did’nt find any trend realted to injury, but we can see that there is trend of injury with the asian male officers.

<ScaleContinuousPosition>
 Range:  
 Limits:    0 --    1

In this graph we are plotting different categorical values against each other and trying to find the relationship between them. We have the plotted the subject race against subject gender with respect to reporting area. We found that the male subject with different races reported across all the reporting areas,Whereas similar pattern can be seen with the females of black and white gender in terms of their reporting area.But it is not the case with the Hispanic female, for them we can see they have reported in all the areas.It is surprising to see that no visualization can be found for the “American Ind” and “Asian” females.

In this next graph we have plotted a combination of box plot and the scatter plot to show the distribution of injury among different genders and spread across multiple sectors. As the data set contains a total of 440 females, 1932 males ,10 null and 1 unknown value in it. We can see that the distribution of both male and females are equally spread out across the sector for injury , with male streaching a little more than women.

In this chart we have tried to find out the timeline of crimes that has occurred across the time period of our data set. In order to make the visualization under stable, a range of values are selected from the data set.As we can see ,we got the peak amount of crimes in the month of June,October,March and August in a descending order respectively. For rest of the year the crime rate stays mostly the same with an exception of the starting months of the year, where the crime rate stays relatively low compared to other months of the year.We can also see the activity decreasing as we move toward the end of the year.

Next,We will going to look at the subject description with respect to subject gender.As we can see the subject description with “Mentally unstable” has highest amount of females in it, followed by “Unknown” and “Alcohol”,whereas the male category is dominated by the most suspect description of “Alcohol”,“unknown”,“unknown Drugs”followed by “Mentally Unstable”. Motor Vehicle have the lowest amount amount of subject description followed by the “Gun” and “weapon”.

Here,we have plotted a correlation matrix of the columns which contain the details of the officers ,and tried to find out if there is any common point among the information of the officers.As we can see the male officers tends to stay longer on force than the female officers and the among of injury among the male officers is also higher than the females officers.It is interesting to find that on the starting years on the force for officers the level of injury is higher ,it means the younger officers tends to get more injured , but as the years on force increases then trend of injury decreases.

Here the pie chart represents the percentage of the subjects according to their racial background.We can see that the biggest percentage area is coved by the subjects where their racial identity is “Black” which is around 55%, the next biggest area is covered by the subjects whose racial identity is “white” which is around 20%. and rest all the subjects with different racial identities stayes below the 15%.